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BACKGROUND OP THE INVENTION 

1 . Field of the Invention 

The present invention relates to computer vision 
10 systems, more particularly to a system having 

computationally efficient real-time object detection, 
tracking, and zooming capabilities. 

2 . Description of Prior Art 

15 Recent advancements in processing and sensing 

performances are facilitating increased development of 
real-time video surveillance and monitoring systems. 

The development of computer vision systems that meet 
application specific computational and accuracy needs are 

20 important to the deployment of real-life computer vision 

systems. Such a computer vision system has not yet been 
realized. 

Past works have addressed methodological issues and 
have demonstrated performance analysis of components and 

25 systems. However, it is still an art to engineer systems 

that meet given application needs in terms of 
computational speed and accuracy. The trend in the art 
is to emphasize statistical learning methods, more 
particularly Bayesian methods for solving computer vision 

30 problems. However, there still exists the problem of 
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choosing the right statistical likelihood model and the 
right priors to suit the needs of an application. 
Moreover, it is still computationally difficult to 
satisfy real-time application needs. 
5 Sequential decomposition of the total task into 

manageable sub-tasks (with reasonable computational 
complexity) and the introduction of pruning thresholds 
one method to solve the problem. Yet, this introduces 
additional problems because of the difficulty in 

10 approximating the probability distributions of 

observables at the final step of the system so that 
Bayesian inference is plausible. This approach to 
perceptual Bayesian is described, for example, in V. 
Ramesh et al . , "Computer Vision Performance 

15 Characterization," RADIUS: Image Understanding for 

Imagery Intelligence, edited by, O. Firschein and T. 
Strat, Morgan Kaufmann Publishers, San Francisco, 1997, 
incorporated herein by reference, and W. Mann and T. 
Binford, "Probabilities for Bayesian Networks in Vision 

20 Proceedings of the ARPA lU Workshop, 1994, Vol. 1, pp. 

633-643. The work done by Ramesh et al . , places an 
emphasis on performance characterization of a system, 
while Mann and Binford attempted Bayesian inference 
(using Bayesian networks) for visual recognition. The 

25 idea of gradual pruning of candidate hypotheses to tame 

the computational complexity of the 



- 2 - 



estimation/classification problem has been presented by 
Y. Amit and D. Geman, "A computational model for visual 
selection," Neural Computation, 1999. However, none of 
the works identify how the sub-tasks (e.g., feature 
5 extraction steps) can be chosen automatically given an 

application context. 

Therefore, a need exists for a method and apparatus 
for a computationally efficient, real-time camera 
surveillance system with defined computational and 
10 accuracy constraints. 

SUMMARY OF THE INVENTION 

The present invention relates to computer vision 
systems, more particularly to a system having 
15 computationally efficient real-time detection and zooming 

capabilities . 

According to an embodiment of the present invention, 
by choosing system modules and performing an analysis of 
the influence of various tuning parameters on the system 

2 0 a method according to the present invention performs 

proper statistical inference, automatically set control 
parameters and quantify limits of a dual-camera real-time 
video surveillance system. The present invention 
provides continuous high resolution zoomed- in image of a 

25 person's head at any location in a monitored area. 

Preferably, an omni-directional camera video used to 



- 3 - 



detect people and to precisely control a high resolution 
foveal camera, which has pan, tilt and zoom capabilities. 
The pan and tilt parameters of the foveal camera and its 
uncertainties are shown to be functions of the underlying 
5 geometry, lighting conditions, background color /contrast , 

relative position of the person with respect to both 
cameras as well as sensor noise and calibration errors. 
The uncertainty in the estimates is used to adaptively 
estimate the zoom parameter that guarantees with a user 

10 specified probability, V, that the detected person's face 

is contained and zoomed within the image. 

The present invention includes a method for 
selecting intermediate transforms (components of the 
system) , as well as processing various parameters in the 

15 system to perform statistical inference, automatically 

setting the control parameters and quantifying a dual- 
camera real-time video surveillance system. 

Another embodiment of the present invention relates 
to a method for visually locating and tracking an object 

20 through a space. The method chooses modules for a 

restricting a search function within the space to regions 
with a high probability of significant change, the search 
function operating on images supplied by a camera. The 
method also derives statistical models for errors, 

25 including quantifying an indexing step performed by an 

indexing module, and tuning system parameters. Further, 
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the method applies a likelihood model for candidate 
hypothesis evaluation and object parameters estimation 
for locating the object. 

The step of choosing the plurality of modules 
5 further includes applying a calibration module for 

determining a static scene, applying an illumination- 
invariant module for tracking image transformation, and 
applying the indexing module for selecting regions of 
interest for hypothesis generation. Further, the method 

10 can apply a statistical estimation module for estimating 

a number of objects and their positions, and apply a 
foveal camera control module for estimating control 
parameters of a foveal camera based on location estimates 
and uncertainties . 

15 Additional modules can be applied by the method, for 

example, a background adaptation module for detecting and 
tracking the object in dynamically varying illumination 
situations . 

Each module is application specific based on prior 
2 0 distributions for imposing restrictions on a search 

function. The prior distributions includes for example: 
an object geometry model; a camera geometry model; a 
camera error model; and an illumination model. 

According to an embodiment of the present invention 
2 5 the camera is an omnicamera. Further, the object is 

tracked using a foveal camera. 
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The method derives statistical models a number of 
times to achieve a given probability of misdetection and 
false alarm rate. The method also validates a theoretical 
model for the space monitored for determining correctness 

5 and closeness to reality. The indexing module selects 

regions with a high probability of significant change, 
motivated by two dimensional image priors induced by 
prior distributions in the space, where the space is in 
three dimensional . 

10 The method of applying a likelihood model includes 

estimating an uncertainty of the object's parameters for 
predicting a system's performance and for automating 
control of the system. 

In an alternative embodiment the method can be 

15 employed in an automobile wherein the space includes an 

interior compartment of the automobile and/or the 
exterior of the automobile. 

In yet another embodiment of the present invention, 
a computer program product is presented. The program 

20 product includes a computer program code stored on a 

computer readable storage medium for, for detecting and 
tracking objects through a space. The computer program 
product includes computer readable program code for 
causing a computer to choose modules for a restricting 

25 search functions within a context to regions with a high 

probability of significant change within the space. The 
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computer program product also includes computer readable 
program code for causing a computer to derive statistical 
models for errors, including quantifying an indexing 
step, and tuning system parameters. Further included is 
5 computer readable program code for causing a computer to 

apply a likelihood model for candidate hypothesis 
evaluation and object parameters estimation within the 
space . 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

Preferred embodiments of the present invention will 
be described below in more detail with reference to the 
accompanying drawings : 

FIG. 1 is a block diagram showing a method for 
15 tracking an object through a space according to one 

embodiment of the present invention; 

FIG. 2 is an illustration of a system of cameras for 
tracking a person according to one embodiment of the 
present invention; 
20 FIG. 3 is an illustration of an omni-image including 

the geometric relationships between elements of the 
system while tracking a person according to one 
embodiment of the present invention; 

FIG. 4 is an illustration of how uncertainties in 
25 three dimensional radial distances influence foveal 

camera control parameters ; and 
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FIG. 5 is an illustration of the geometric 
relationship between a foveal camera and a person. 

Throughout the diagrams, like labels in different 
figures denote like or corresponding elements or 
5 relationships. Further, the drawings are not to scale. 



DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

The present invention solves the problems existing 
in the prior art described above, based on the following 

10 methods. 

system Configuration choice: According to one 
embodiment of the present invention, modules are chosen 
for an optical surveillance system, by use of context, in 
other words: application specific prior distributions for 

15 modules. These modules can include, for example, object 

geometry, camera geometry, error models and illumination 
models. Real-time constraints are imposed by pruning or 
indexing functions that restrict the search space for 
hypotheses. The choice of the pruning functions is 

20 derived from the application context and prior knowledge. 

A proper indexing function will be one that simplifies 
computation of the probability of false hypothesis or the 
probability of missing a true hypotheses as a function of 
the tuning constraints . 

2 5 Statistical modeling and Performance 

Characterization: According to an aspect of the present 



invention, the derivation of statistical models for 
errors at various stages in the chosen vision system 
configuration assists in QTjantifying the indexing step. 
The parameters are tuned to achieve a given probability 
5 of miss-detection and false alarm rate. In addition, a 

validation of theoretical models is performed for 
correctness (through Monte-Carlo simulations) and 
closeness to reality (through real experiments) . 

Hypotheses verification and parameter estimation: 

10 Bayesian estimation is preferably used to evaluate 

candidate hypotheses and estimate object parameters by 
using a likelihood model, P (measurements /hypothesis) , 
that takes into account the effects of the pre-processing 
steps and tuning parameters. In addition, the 

15 uncertainty of the estimate is derived to predict system 

performance . 

One embodiment of the present invention includes a 
two camera surveillance system which continuously 
provides zoomed-in high resolution images of the face of 

2 0 a person present in a room. These images represent the 

input to higher-level vision modules, e.g., face 
recognition, compaction and event-logging. 

In another embodiment, the present invention 
provides: 1) real-time performance on a low-cost PC, 2) 

25 person misdetection rate of Elm, 3) person false-alarm 

rate of Ilf, 4) adaptive zooming of person irrespective of 



background scene structure with maximal possible zoom 
based on uncertainty of person attributes estimated 
(e.g., location in three dimensional (3D), height, etc.), 
with performance of the result characterized by face 
5 resolution attainable in area of face pixel region (as a 

function of distance, contrast between background and 
object, and sensor noise variance and resolution) and 
bias in the centering of the face. In addition, the 
method makes assumptions about scene structure, for 

10 example, the scene illiominate consists of light sources 

with similar spectrum (e.g., identical light sources in 
an office area) , the number of people to the detected and 
tracked is bounded, and the probability of occlusion of 
persons (due to other persons) is small. 

15 Referring to FIG. 2, to continuously monitor an 

entire scene, the present invention uses an 
omnidirectional sensor including a omni-camera 205 and a 
parabolic mirror 2 06, for example, the OmniCam of S. 
Nayer, "Omnidirectional Video Camera, " Proceedings of the 

20 DARPA Image Understanding Workshop, Vol. 1, pp. 235-242, 

1997. This camera is preferably mounted below the 
ceiling 200 looking into the parabolic mirror located on 
the ceiling. The parabolic mirror 2 06 enables the camera 
2 05 to see in all directions simultaneously. Note that 

2 5 FIG. 2 is an illustration of one embodiment of the 

present invention. Other embodiments are contemplated. 
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including, for example, different mirror alignments, 
alternative camera designs (including, for example, 
catadioptric stereo, panoramic, omni, and foveal 
cameras) , varying the orientation of the cameras and 
5 multiple cameras systems . The present invention can be 

employed using a verity of cameras, calibration modules 
(discussed below) including a combination of real world 
and image measurements, compensate for different 
perspectives . 

10 The present invention uses omni-images to detect and 

estimate the precise location of a given person's foot in 
the room and this information is used to identify the 
pan, tilt and zoom settings for a high-resolution foveal 
camera. An omni -image is the scene as viewed from the 

15 omni-camera 2 05, typically in conjunction with a 

parabolic mirror 2 06, mounted preferably on the ceiling 
200. 

According to one embodiment of the present 
invention, the choice of the various estimation steps in 

2 0 the system is motivated from image priors and real-time 

requirements. The camera control parameters, e.g., pan 
and tilt, are selected based on the location estimate and 
its uncertainty (that is derived from statistical 
analysis of the estimation steps) so as to center the 

25 person's head location in the foveal image. The zoom 

parameter is set to maximum value possible so that the 
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camera view still encloses the persons head within the 
image . 

The general Bayesian formulation of the person 
detection and location estimation problem does not suit 
5 the real-time constraints imposed by the application. In 

one embodiment of the present invention, this formulation 
is used only after a pruning step. The pruning step 
rules out a majority of false alarms by designing an 
indexing step motivated by the two dimensional (2D) image 

10 priors (region size, shape, intensity characteristics) 

induced by the prior distribution in the 3D scene. The 
prior distributions for person shape parameters, 
including, for example, size, height, and his/her 3D 
location, are reasonably simple. These priors on the 

15 person model parameters induce 2D spatially variant prior 

distributions in the projections, e.g., the region 
parameters for a given person in the image depends on the 
position in the image, whose form depends on the camera 
projection model and the 3D object shape. In addition to 

20 shape priors, the image intensity/color priors can be 

used in the present invention. 

Typically, a method according to the present 
invention does not make assumptions about the object 
intensity, e.g., the homogeneity of the object since 

25 people can wear variety of clothing and the color 

spectrum of the light source is therefore not 
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constrained. However, in an alternative embodiment, in a 
surveillance application, the background is typically 
assiomed to be a static scene (or a slowly time varying 
scene) with known background statistics. Gaussian 
5 mixtures are typically used to approximate these 

densities. To handle shadowing and illumination changes, 
these distributions are computed after the calculation of 
an illumination invariant measure from a local region in 
an image. The prior distribution of the spectral 

10 components of the illuminants are ass\amed to have same 

but unknown spectral distribution. Further, the noise 
model for CCD sensor noise 106 can be specified. This is 
typically chosen to be i.i.d. zero mean Gaussian noise in 
each color band. 

15 In one embodiment of the present invention, the 

system preferably includes five functional modules: 
calibration, illumination-invariant measure computation 
at each pixel, indexing functions to select sectors of 
interest for hypothesis generation, statistical 

20 estimation of person parameters (e.g., foot location 

estimation) , and foveal camera control parameter 
estimation. 

Referring to FIG. 1, block diagram of the 
transformations applied to the input. A sensor ICQ, for 

25 example, an omnidirectional camera, records a scene 105, 

which preferably is recorded as a color image, the scene 
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105 is sent to input 110 as: R(x,y),G(x,y),B(x,y) . The 

sensor is also subject to sensor noise 106 which will 
become part of the input 110. 

5 The input 110, defined above, is transformed 115 

(T:R^-^R^), typically to compute an illumination invariant 
measure r^{x,y),g^(x,y) 120. The statistical model for the 

distribution of the invariant measure is influenced by 
the sensor noise model and the transformation T{ .) . The 
10 invariant measure mean (B^(x,y) = (ri,(x,y),g^(x,y))) and 

covariance matrix ^ ^ (x,y) , is computed at each pixel 

(x,y) from several samples ofR(x,y), G(x,y), B(x,y) for 

the reference image 121 of the static scene. A change 
detection measure 

d^(x,y) image 13 0 is obtained by computing the Mahalanobis 
distance 125 between the current image data values 

f^c(^,y)<gc(^,y)^^'^ the reference image data Bo(x,y). This 
distance image is used as input to two indexing functions 
PiO 135 and PaO 140. Pi ( ) 135 discards the radial lines 

20 2 by choosing hysteresis thresholding parameters 13 6 that 

satisfy a given combination of probability of false alarm 
and miss-detection values, passing the results 137 to P2 ( ) 
140. ?2() 140 discards segments along the radial lines in 
the same manner, by choosing hysteresis thresholding 

25 parameters 138. The result is a set of regions with high 

probability of significant change 141. At this point the 
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method employs a full blown statistical estimation 
technique 145 that uses the 3D model -information 146, 
camera geometry information 147, priors 148 (including 
objects, shape, and 3D location) , to estimate the number 
of objects and their positions 150. The method 
preferably estimates the control parameters 155 for the 
foveal camera based on the location estimates and 
uncertainties. Accordingly, the foveal camera is 
directed by the control parameters and hysteresis 
thresholding parameters, for example, a miss-detection 
threshold. 

Additional modules are contemplated by the present 
invention. For example, a background adaptation module 
111. To generalize the system and cover outdoor and 
hybrid illiomination situations (indoor plus outdoor 
illumination) as well as slow varying changes in the 
static background scene, the present invention 
incorporates a scheme described in "Adaptive background 
mixture models for real-time tracking", Chris Stauffer, 
W.E.L. Crimson (Proceedings of the CVPR conference, 
1999), incorporated herein by reference. It can be shown 
qualitatively that the statistics for background pixels 
can be approximated by a Gamma distribution. The 
statistics are stable within a given time window. In the 
present invention the background adaptation module is 
fused with the system, without changing the entire 
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analysis and algorithm. By re-mapping the test-statistic 

derived from the data, so that the cumulative density 

function of the re-mapped test-statistic approximates the 

cumulative density function of a Chi-square distribution. 

5 Therefore, the result of the Grimson-approach is re- 

mapped pixelwise to obtain dg^ in block 112, following 

the transform described below. By adding dg^ 112 (for 

each pixel) to the d^ value 130 (see equ. 7), a new 

distance image is obtained. This distance image can be 

10 input to the index function 135. 

The output of the background adaptation module 111 
is also used to update the static background statistics, 
as shown in block 121. 

The distribution of pixels of the new distance 

15 measurement are also Chi-square distributed. The only 

difference is a rise in the degree of freedoms from two 
to three. The analysis remains the same, the thresholds 
are derived as described below. This is an illustration 
of how different modules can be fused in an existing 

20 framework without changing the statistical analysis. 

After reading the present invention, formulation of these 
additional modules will be within the purview of one 
ordinary skilled in the art. 

The projection model for the two cameras is 

25 discussed below with respect to FIGs . 2 through 5. The 

following geometric model parameters are denoted as : 
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• Ho height of OmniCam above floor {inches) 

• Uf height of foveal camera above floor (inches) 

• Hp person's height (inches) 

Rh person's head radius (inches) 
5 • Rf person's foot position in world coordinates 

(inches) 

• Dc on floor projected distance between cameras 
(inches) 

• p(xc,yc) position of OmniCam center, (in omni-image) 
10 (pixel coordinates) 

rm radius of parabolic mirror (in omni-image) (pixels) 

• rh distance person's head - (in omni-image) (pixels) 
rf distance person's foot - (in omni-image) (pixels) 
r| - angle between the person and the foveal camera 

15 relative to the OmniCam image center (Please see 

figure 3 ) . 

• 2 - angle between the radial line corresponding to 
the person and the zero reference line (please see 
figure 3 ) . 

20 Where capital variables are variables in 3D, and 

small variables are given in image coordinates. During 
the calibration step (combination of real world and image 
measurements) Ho, Hf,Dc,rm and p(xc,yc) .are initialized and 
the corresponding standard deviations or tolerances are 

25 determined. In a preferred embodiment the calibration 

step is performed offline. Heights are typically 
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calculated from the floor 201 up. 

Using the geometric features of an OmniCam 205, 
including a parabolic mirror, and under the hypothesis 
that the person 22 0 is standing upright, the relationship 
between rf respectively rj, and Rp can be shown to be: 
a = 1—^ 5" 



aHo f with (1) 



h= 2- 



Rp = b(Ho - Hp) with (2) 

Let V, and 3 be the foveal camera 210 control parameters 
for the tilt and pan angles respectively. Further, Dp, 
the projected real world distance between the foveal 
camera 210 and the person 220. Assuming, the person's 
head is approximately located over his/her feet, and 
using basic trigonometry in FIGs . 2 and 3, it can easily 
be seen that Dp, V, and 3 are equal to: 



D^ = ^D^ + Rl-2D^R^cos(§) (3) 



tan(V)= ^ ;sm(yg) = ^sin(;?) (4) 

p p 

where ^ is the angle between the person 22 0 and the 

foveal camera 210 relative to the OmriiCam 205 position. 

20 This step is the module that takes in as input, the 

current color image (R(x,y),G(x,y),B(x,y)) ' normalizes it to 
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obtain (r^(x,y),g^ix,y)) and compares it with the background 
statistical model j),^ ^ to produce an 

illumination invariant measure image d^^x,}!) . This 
section illustrates the derivation of the distribution of 
d^(x,y) given that the input image measurements R,G and B 
are Gaussian with mean R,G,B, and identical standard 
deviation O. 

With respect to FIG. 1, the illiimination prior 
assumption 116, is that the scene contains multiple light 
sources with the same spectral distribution with no 
constraint on individual intensities. To compensate for 
shadows which are often present in the image, the method 
employs a shadow invariant representation of the color 
data. The invariant representation is according to G. 
Wyszecki and W.S. Stiles "Color Science: Concepts and 
Methods, Quantitative Data and Formulae," John Wiley & 
Son, 1982 incorporated herein by reference. Accordingly, 
let S = R+G+B. The illumination normalizing transfonn 
T:R^^ R^ appropriate for the method's assumptions is: 

r= — . It can be shown that, the 

R+G+B^ R+G+B 

uncertainties in the normalized estimates r and g are 
dependent not only on sensor noise variance, but also on 
the actual true unknown values of the underlying samples 
(due to the non-linearities in the transformation T{ . ) ) . 
Based on the assumption of a moderate signal to noise 
ratio (i.e., <I> « S) , the method approximates (r,^)^ as 
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having a normal distribution with pixel -dependent 
covariance matrix 



(;)■ 



y = — 



2R ^R^ R+G RG 
R+G RG 2G G^ 



The values of ^ ; 



2 2 
^ S ^ , and ^ - 5 are determined 



offline for an entire OmniCam 205 frame, e.g., for each 

point or pixel on the image plane 207. These points vary 

spatially. Note, that in the normalized space the 

covariance matrix for each pixel is different: Bright 

regions in the covariance image correspond to regions 

with high variance in the normalized image. These 

regions correspond to dark regions in RGB space . 

Since the covariance matrices in the normalized 

space are pixel -dependent , a method according to the 

present invention calculates the test statistic, i.e., 

the Mahalanobis distance d^, that provides a normalized 

distance measure of a current pixel being background. 
Let 

be the vector of mean rb, and mean gb at a certain 
background position (mean bb is redundant, due to 
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normalization) , and ^ be the corresponding vector of the 



current image pixel . 



Since 




.) 



(6) 



the method can define, for each pixel, a metric d^ which 
corresponds to the probability, that //^ is background 

pixel : 



For background pixels, is approximately distributed 

with two degrees of freedom. For object pixels 

happens to be non-central distributed with two degrees 

of freedom, and non-central ity parameter c. 

To address real-time computational requirements of 
the application the method identifies sectored segments 
in the image that potentially contains people of 
interest. To perform this indexing step in a 
computational efficient manner the method defines two 
index functions Pi { ) and P2 ( ) that are applied 
sequentially as shown in FIG. 1. Essentially Pi ( ) and 
P2 ( ) are projection operations. For 

instance, define d^(R,6) as the change detection measure 

image in polar coordinates with coordinate system origin 
at the omni-image center p(x^,y^). Then, Pi ( ) is chosen to 
be the projection along radial lines to obtain M , the 



d' = 



(/r,-/?j^(2i:-,,^)-i(/?,-/?j 



(7) 
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test statistic that can be used to identify changes along 

a given direction 2. This test statistic is justified by 

the fact that the object projection is approximated by a 

line-set (approximated as an ellipse) whose major axis 

5 passes through the omni-image center with a given length 

distribution that is a function of the radial foot 

position coordinates of the person in the omni-image. 

This section derives the expressions for the 

probabilities of false alarm and misdetection at this 
10 step as a function of the input distributions for d^{R,B) , 

the prior distribution for the expected fraction of the 
pixels along a given radial line belonging to the object, 
and the noncentrality parameter of d^{R,d) in object 

locations . 

15 Let Lg^^'be a radial line trough p(x^,y^) , 

parameterized by angle 2, and M(^) = S ^ cf^(r) denote the 
accumulative measure of d^ values at image position p{6,r) 

parameterized by angle 2 and distance r in a polar 
coordinate system at p(x^,y^) . Applying Canny' s hysteresis 
2 0 thresholding technique on M{d) , provides the sectors of 

significant change bounded by left and right angles 2i 

respectively 2r. Let be the total number of pixels 
along a radial line Uf^' , and k. be the expected niamber of 

object pixels along this line. The distribution of Jc can 
2 5 be derived from the projection model and the 3D prior 

models for person height, size, and position described 
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previously. The distribution of the cumulative measure 
is : 



Background 



(8) 



Object 



(9) 



with c G [0 . . . inf ) . 

To obtain a false-alarm rate for false sectors of 
equal or less than Xf% the method can set the lower 
threshold Ti so that 



To guarantee a misdetection rate of equal or less than 

Xm%, theoretically, the method can solve for an upper 

threshold Tu similarly by evaluating the distribution in 

object equation above. Note that is a function of Hp, 

Rf, and c. Therefore, the illustrative method would need 

to know the distributions of Hp, Rf, and c to solve for 

Tu. Rather then make assumptions about the distribution of 

non-central parameter c, the method uses LUT Tu(x^) 

generated by simulations instead. 

The second index function P2 ( ) essentially takes as 

input the domain corresponding to the radial lines of 

interest and performs a pruning operation along the 
radial lines R. This is done by the computation of d^^{r) 

the integration of the values () along 2f =2 + n/2 (within 




(10) 
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a finite window whose size is determined by the prior 

density of the minor axis of the ellipse projection] , for 

each point r on the radial line 2. The derivation of the 

distribution of the test statistic and the choice of the 

5 thresholds are exactly similar to the above step. 

The illustrative method derives the distributions of 

the image measurements, and has narrowed the 

hypotheses for people location and attributes . The 

method performs a Bayes estimation of person locations 
10 and attributes. This step uses the likelihood models 

L{d^\background) and Lid^\object) along with 2D prior models 

for person attributes induced by 3D object priors P(Rp), 

P(H), P(2) and P(S). The present embodiment uses the 

fact that the probability of occlusion of a person is 

15 small to assert that the probability of a sector 

containing multiple people is small. The center angle 2f 
of a given sector would in this instance provide the 
estimate of the major axis of the ellipse corresponding 
to the person. It is then sufficient to estimate the 

2 0 foot location of person along the radial line 

corresponding to 2f. The center angle 2f of the sector 
defines the estimate for the angular component of the 
foot position. The illustrative method approximates 9^ 
to be normal distributed with unknown 2f and variance oOf . 

25 2f's are estimated as the center positions of the angular 

sectors given by Pi ( ) . The standard deviation of a given 
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estimate can be determined by assioming that the width of 

the angular sector gives the 99 percentile confidence 

interval. Alternatively, this estimation can be obtained 

through sampling techniques. 

5 Given the line 2i it is necessary to estimate the 

foot position of the person along this radial line. To 

find this estimate and variance of the radial foot 

position rf the method chooses the best hypothesis for the 
foot position that minimizes the Bayes error. Let F{h\m) 

10 denote the posterior probability to be maximized, where hi 

denotes the ith out of multiple foot position hypotheses 
and m the measurements (dg^{r)) , that are statistically 

independent; hyper- script i5 or o denotes background 
respectively object: 



P{h\m) 

= P(/zf|m*)P(/z;|m°)= P(h^\m^)(l- PCVk")) 
^ p(m'\h^)P(h^) pjm")- p{m"\h;)Pih;} 
pirn'') pirn") 



where p denotes the density function. P(hi*m) becomes 
maximal for maximal p(m''\h.) and minimal 

p(m''\h°) , so that 



argmax 



p{ni' 




pirn" 


K), 
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I (r)+ S dt (r)- S ^2 



In one embodiment of the present invention, an 
estimate of the uncertainty in the foot position is 

5 made. The method provides pdf's up to the latest step in 

the algorithm. At this point it is affordable to 
simulate the distribution of and generate cr? via 

perturbation analysis, since only few estimates with 

known distributions are involved in few operations. The 
10 method can approximate ^> as Gaussian distributed with 

unknown mean r. , and variance (jj . 

Once the foot position p{dj:,rp is known, the method 

can apply formula 1 through 4 above, to estimate 3D 

distances Rp, Dp, and foveal camera control parameter tilt 

15 V, pan 3 and zoom factor z. 

FIGs. 4 and 5 illustrate how uncertainties in 3D 

radial distance Rp influence the foveal camera control 

parameters . For the following error propagation steps the 
method assumes that r^,rp,H^,Hp,H^ , and are Gaussian 

20 random variables with true unknown means ^m'^'p'^o'^p'^h'^f 

, and Dc, and variances ,(7^ and 

respectively (all estimated in the calibration phase) . 
The estimates and it's uncertainties propagate through 
the geometric transformations. The method produces the 



final results for the uncertainties in tilt V, and pan 3, 

which were used to calculate the zoom parameter z. (for 
more details, and derivations of a\ ,C7| see M. 

Greif fenhagen and V. Ramesh, "Auto-Camera-Man : Multi- 
5 Sensor Based Real-Time People Detection and Tracking 

System," Technical Report, Siemens Corporate Research, 
Princeton, NJ, USA, Nov. 1999.): 



+ C7? +(T^ ) 

R 

P u f 



H R, 
P h f 



2 /?/fT/cos'y 

^sin'' ^ + ^^^^ ^ + ^) * 



Given the uncertainties in the estimates, the method 

derives the horizontal and vertical angle of view for the 
foveal camera, Yh respectively j/^ , which map directly to 



the zoom parameter z . FIGs . 4 and 5 show the geometric 
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relationships for the vertical case. Following equation 
provides the vertical angle of view. 



,,,2„jA;i£sialwith = ^ (15, 

fa 

5 where factor solves for J 2 jv(0,l)i/^ = ^% given user 

specified confidence percentile that the head is 
display in the foveal frame. Similar derivations apply 
for the horizontal case. 

The method verifies the correctness of the 

10 theoretical expressions and approximations through 

extensive simulations. only show plots validating 
expressions for illumination normalization (eqn. 5) , and 
for foveal camera control parameters (eqn. 13, 14). This 
validation assumes correctness of the underlying 

15 statistical models. Validation of the models on real 

data is discussed below. 

The correctness of the models is verified by 
comparing ground truth values against module estimates 
for mean and variance of the running system. The 

2 0 following is an illustration of an embodiment of the 

present invention, eight positions Pi - P8 are marked 
having different radial distances and pan angles. 
Positions and test persons were chosen to simulate 
different positions, illumination, and contrast. The 

25 table for the final foveal camera control parameters is 



for one person. Ground truth values for the mean values 
were taken by measuring tilt angle a, and pan angle b by 
hand, and are compared against the corresponding mean of 
system measurements estimated from 100 trials per 
5 position and person. The variances calculated from the 

system estimates for pan and tilt angle are compared 
against the average of the corresponding variance- 
estimates calculated based on the analysis. The 
comparison between system output and ground truth 
10 demonstrates the correctness of the model assumptions in 

the statistical modeling process (see Table 1) . 

Table 1: Validation: First two lines shows the predicted 
and experimental variances for the tilt angle, 
15 respectively. The next two lines correspond to pan 

angle . 





PI 


P2 


P3 


P4 


P5 


P6 


P7 


P8 


^2 

a tana 


2.1 


2 . 12 


1.57 


1.4 


1.35 


1.31 


1.31 


1.32 


^2 

a tana 


2 .05 


2 .04 


1.6 


1.34 


1.36 


1.32 


1.4 


1.31 




28.9 


26.1 


21.3 


17.9 


15.3 


15.2 


18.4 


20.1 


^1," 


25 .9 


24.1 


19.5 


15.1 


14.9 


15 


18.1 


19.3 



- 29 - 



The performance of the running system will now be 
discussed. The output of the foveal camera is sufficient 
as input for face recognition algorithms. Illustrating 
how the statistical analysis is used to optimize the 
camera setup, equ. 13 and 14 suggest that the 
configuration that minimizes these uncertainties is the 
one with large inter-camera distance Da and foveal camera 
height Ht equal to the mean person eye-level height Hp. 

The present invention is reliable in terms of 
detection and zooming over longtime experiments within 
the operational limits denoted by the outer line of the 
upper right contour plot. 

The setup of the system (for example, placement of 
foveal camera) influences precision globally and locally. 
Preferred directions of low uncertainties can be used to 
adapt the system to user defined accuracy constraints in 
certain areas of the room. 

In another embodiment of the present invention, a 
system for monitoring in and around an automobile is 
presented. The inventions uses an omni -directional 
sensor (a standard camera plus a mirror assembly) to 
obtain a global view of the surroundings within and 
outside the automobile. The omni-camera video is used 
for detection and tracking of objects within and around 
the automobile. The concept is an extension of the 
methods described above with respect to tracking objects 



within a room. In this embodiment the system can be used 
to improve safety and security. 

The video analysis system can include multiple 
modules. For example, a calibration module where the 
5 center of the Omni -camera image is used with height 

information of the ceiling of the automobile to translate 
image coordinates to ground plane coordinates . Where a 
CAD model of the automobile is available, the image 
coordinates can be mapped to a 3D point on the interior 
10 of the automobile using this calibration step (if the 

automobile is not occupied) . Another example is a change 
detection module that compares a reference map (reference 
image plus variation around the reference image) to 
current observed image map to determine a pixel -based 
15 change detection measure. This is done by transforming 

the color video stream into normalized color space (to 
deal with illumination variation) . The change detection 
measure is used to index into a set of possible 
hypothesis for object positions and locations. Yet 
2 0 another example includes a background update module for 

varying background conditions (e.g. gain control change, 
illumination changes ) . A grouping module that takes the 
change detection measure along with a geometric model of 
the environinent and the objects to identify likely object 
25 locations. In the current embodiment, the method 

provides the areas in the image corresponding to the 
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windows and model people by upright cylinders when they 
are outside of the automobile. In the interior of the 
automobile, people can be modeled by generalized 
cylinders. Still another module includes an object 
tracking module that takes location information over time 
to do prediction of object locations in the subsequent 
time step and to re-estimate their new locations. 
Preferably, the visualization is presented on a color 
liquid crystal display (LCD) panel mounted with the rear- 
view mirror. The visualization module presents 
geometrically warped video of the omni-cam video. This is 
useful for driver assistance (e.g. while the driver is 
backing up or when he/she is changing lanes) . Other 
modules are contemplated by the present invention 
including, for example, a module that determines an 
approaching object's potential threat, e.g., at a higher 
rate of speed or from a particular direction. 

According to the automotive embodiment of the 
present invention, the OmniCam is a catadioptric system 
that includes two parts: a parabolic mirror; and a 
standard CCD camera looking into it. The invention is 
useful as a sensor for use in driver assistance. It is 
also useful for monitoring the surroundings when the 
automobile is stationary and for recording videos in the 
event that a person approaches the automobile and 
attempts to get unauthorized access. ■ The omni-camera 
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system can be use in conjunction with a pan-tilt camera 
to enable the capture of a zoomed up image of the persons 
involved. Once a person gains unauthorized access to the 
automobile and an alarm is triggered, a security system 
integrating vision, global positioning system (GPS) and 
mobile phone, can transmit the time, location and the 
face image of the person to a central security agency. 
In addition to the monitoring capability, the ability to 
present the panoramic view of the surroundings provides a 
method to alert the driver to potential danger in the 
surrounding area by visually emphasizing the region in 
the panoramic view. In addition, due to the mounting 
position of the Omni-camera, looking up into a parabolic 
mirror located on the ceiling of the automobile 
(preferably centered) , parts of the surroundings that are 
invisible to the driver are visible in the Omni -view. 
Thus, the driver blind spot area is significantly 
reduced. By evaluating the panoramic view it is possible 
to trigger warnings, e.g., if other cars enter a driver's 
blind spot. If automobile status information (speed, 
steering wheel position, predicted track) is combined 
with panoramic video processing it is possible to alert a 
driver to impending dangers or potential accidents. 

The present invention contemplates a system and 
method for tracking an object. The invention can be 
employed in varying circumstances, for example, video 
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conferencing, distance learning, and security stations 
where a user can define an area of interest there by 
replacing traditional systems employing banks of 
monitors. The present invention also contemplates an 
application wherein the system is used in conjunction 
with a data-log for recording time and location together 
with images of persons present. In a data-log 
application the system can associate an image with 
recorded information upon the occurrence of an event, 
e.g., a person sits at a computer terminal within an area 
defined for surveillance. The data-log portion of the 
system is preferably performed by a computer, where the 
computer records, for example, the time, location, and 
identity of the subject, as well as an accompanying 
image. The present invention is not limited to the above 
applications, rather the invention can be implemented in 
any situations where object detection, tracking, and 
zooming is needed. 

Having described preferred embodiments of the 
present invention having computationally efficient real- 
time detection and zooming capabilities, it is noted that 
modifications and variations can be made by persons 
skilled in the art in light of the above teachings. It 
is therefore to be understood that changes may be made in 
the particular embodiments of the invention disclosed 
which are within the scope and spirit of the invention as 
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defined by the appended claims. 
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WHAT IS CLAIMED IS : 

1 . A method for visually locating and tracking an 
object through a space, comprising the steps of: 

choosing a plurality of modules for restricting a 
5 search function within the space to a plurality of 

regions with a high probability of significant change, 
the search function operating on images supplied by a 
camera; 

deriving statistical models for errors, including 
10 quantifying an indexing step performed by an indexing 

module, and tuning system parameters; and 

applying a likelihood model for candidate hypothesis 
evaluation and object parameters estimation for locating 
the object. 

15 

2. The method of claim 1, wherein the step of 
choosing the plurality of modules further comprises the 
steps of: 

applying a calibration module for deteirmining a 
20 static scene; 

applying an illumination- invariant module for 
tracking image transformation; and 

applying the indexing module for selecting regions 
of interest for hypothesis generation. 

25 

3. The method of claim 2, further comprising the 
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steps of: 

applying a statistical estimation module for 
estimating a number of objects and their positions; and 
applying a foveal camera control module for 
5 estimating a plurality of control parameters of a foveal 

camera based on location estimates and uncertainties . 

4. The method of claim 2, further comprising the 
step of applying a background adaptation module for 

10 detecting and tracking the object in dynamically varying 

illumination situations. 

5. The method of claim 1, wherein each module is 
application specific based on a plurality of prior 

15 distributions for imposing restrictions on a search 

function. 

6. The method of claim 5, wherein the plurality of 
prior distributions comprise: 

20 an object geometry model; 

a camera geometry model; 
a camera error model ; and 
an illumination model. 

25 7. The method of claim 1, wherein the camera is an 

omni camera . 
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8. The method of claim 1, wherein the object is 
tracked using a foveal camera. 

5 9. The method of claim 1, wherein the step of 

deriving statistical models is applied a plurality of 
times to achieve a given probability of misdetection and 
false alarm rate. 

10 10. The method claim 9, further comprising the step 

of validating a theoretical model for the space monitored 
for determining correctness and closeness to reality. 

11. The method of claim 1, wherein the indexing 
15 module selects a plurality of regions with a high 

probability of significant change, motivated by a 
plurality of two dimensional image priors induced by a 
plurality of prior distributions in the space, wherein 
the space is three dimensional. 

20 

12. The method of claim 1, wherein the step of 
applying a likelihood model further comprises the step of 
estimating an uncertainty of the object's parameters for 
predicting a system's perfoarmance and for automating 

25 control of the system. 
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13. The method of claim 1, employed in an 
automobile wherein the space monitored comprises one of 
an interior compartment of the automobile and an exterior 
of the automobile. 

14. A computer program product comprising computer 
program code stored on a computer readable storage medium 
for, for locating and tracking objects through a space, 
the computer program product comprising: 

computer readable program code for causing a 
computer to choose a plurality of modules for a 
restricting search functions within a context to a 
plurality of regions with a high probability of 
significant change within the space; 

computer readable program code for causing a 
computer to derive statistical models for errors, 
including quantifying an indexing step, and tuning system 
parameters; and 

computer readable program code for causing a 
computer to apply a likelihood model for candidate 
hypothesis evaluation and object parameters estimation 
for locating the object. 



- 39 - 



ABSTRACT 

The present invention relates to a method for 
visually detecting and tracking an object through a 
space. The method chooses modules for a restricting a 
search function within the space to regions with a high 
probability of significant change, the search function 
operating on images supplied by a camera. The method 
also derives statistical models for errors, including 
quantifying an indexing step performed by an indexing 
module, and tuning system parameters. Further the method 
applies a likelihood model for candidate hypothesis 
evaluation and object parameters estimation for locating 
the object. 
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